Skip to content

Add Claude skill to create instrumentations#10774

Open
PerfectSlayer wants to merge 4 commits intomasterfrom
bbujon/ai-toolkit
Open

Add Claude skill to create instrumentations#10774
PerfectSlayer wants to merge 4 commits intomasterfrom
bbujon/ai-toolkit

Conversation

@PerfectSlayer
Copy link
Contributor

What Does This Do

This PR introduces a Claude skill to create APM integrations.

Motivation

This is part of the experimentation to get APM Instrumentation Toolkit integration with dd-trace-java.

Additional Notes

I tried to include upgrade and feedback directly from the skill. I expect it to improve itself overtime with usage.

Contributor Checklist

Jira ticket: [PROJ-IDENT]

Note: Once your PR is ready to merge, add it to the merge queue by commenting /merge. /merge -c cancels the queue request. /merge -f --reason "reason" skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.

@PerfectSlayer PerfectSlayer requested a review from a team as a code owner March 9, 2026 16:56
@PerfectSlayer PerfectSlayer added the tag: no release notes Changes to exclude from release notes label Mar 9, 2026
@PerfectSlayer PerfectSlayer added tag: experimental Experimental changes tag: ai generated Largely based on code generated by an AI or LLM labels Mar 9, 2026
@PerfectSlayer PerfectSlayer requested review from jordan-wong and wconti27 and removed request for manuel-alvarez-alvarez March 9, 2026 16:57
@pr-commenter
Copy link

pr-commenter bot commented Mar 9, 2026

Benchmarks

Startup

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master bbujon/ai-toolkit
git_commit_date 1773366819 1773389033
git_commit_sha fd65c0a 3cedda9
release_version 1.61.0-SNAPSHOT~fd65c0aa59 1.61.0-SNAPSHOT~3cedda9dc0
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1773390911 1773390911
ci_job_id 1503261681 1503261681
ci_pipeline_id 102316595 102316595
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-0-2ezmu32l 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-0-2ezmu32l 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
module Agent Agent
parent None None

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 65 metrics, 6 unstable metrics.

Startup time reports for insecure-bank
gantt
    title insecure-bank - global startup overhead: candidate=1.61.0-SNAPSHOT~3cedda9dc0, baseline=1.61.0-SNAPSHOT~fd65c0aa59

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.069 s) : 0, 1068682
Total [baseline] (8.858 s) : 0, 8857608
Agent [candidate] (1.073 s) : 0, 1073000
Total [candidate] (8.829 s) : 0, 8828696
section iast
Agent [baseline] (1.225 s) : 0, 1224600
Total [baseline] (9.554 s) : 0, 9554420
Agent [candidate] (1.238 s) : 0, 1237677
Total [candidate] (9.608 s) : 0, 9607984
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.069 s -
Agent iast 1.225 s 155.918 ms (14.6%)
Total tracing 8.858 s -
Total iast 9.554 s 696.811 ms (7.9%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.073 s -
Agent iast 1.238 s 164.677 ms (15.3%)
Total tracing 8.829 s -
Total iast 9.608 s 779.289 ms (8.8%)
gantt
    title insecure-bank - break down per module: candidate=1.61.0-SNAPSHOT~3cedda9dc0, baseline=1.61.0-SNAPSHOT~fd65c0aa59

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.21 ms) : 0, 1210
crashtracking [candidate] (1.207 ms) : 0, 1207
BytebuddyAgent [baseline] (635.696 ms) : 0, 635696
BytebuddyAgent [candidate] (636.1 ms) : 0, 636100
AgentMeter [baseline] (29.37 ms) : 0, 29370
AgentMeter [candidate] (29.401 ms) : 0, 29401
GlobalTracer [baseline] (258.772 ms) : 0, 258772
GlobalTracer [candidate] (259.708 ms) : 0, 259708
AppSec [baseline] (32.031 ms) : 0, 32031
AppSec [candidate] (32.029 ms) : 0, 32029
Debugger [baseline] (59.507 ms) : 0, 59507
Debugger [candidate] (59.403 ms) : 0, 59403
Remote Config [baseline] (624.973 µs) : 0, 625
Remote Config [candidate] (621.433 µs) : 0, 621
Telemetry [baseline] (8.861 ms) : 0, 8861
Telemetry [candidate] (8.725 ms) : 0, 8725
Flare Poller [baseline] (6.414 ms) : 0, 6414
Flare Poller [candidate] (9.524 ms) : 0, 9524
section iast
crashtracking [baseline] (1.19 ms) : 0, 1190
crashtracking [candidate] (1.201 ms) : 0, 1201
BytebuddyAgent [baseline] (793.497 ms) : 0, 793497
BytebuddyAgent [candidate] (803.739 ms) : 0, 803739
AgentMeter [baseline] (11.311 ms) : 0, 11311
AgentMeter [candidate] (11.651 ms) : 0, 11651
GlobalTracer [baseline] (247.36 ms) : 0, 247360
GlobalTracer [candidate] (249.343 ms) : 0, 249343
IAST [baseline] (25.375 ms) : 0, 25375
IAST [candidate] (25.449 ms) : 0, 25449
AppSec [baseline] (26.623 ms) : 0, 26623
AppSec [candidate] (26.735 ms) : 0, 26735
Debugger [baseline] (63.119 ms) : 0, 63119
Debugger [candidate] (62.945 ms) : 0, 62945
Remote Config [baseline] (515.895 µs) : 0, 516
Remote Config [candidate] (520.781 µs) : 0, 521
Telemetry [baseline] (14.773 ms) : 0, 14773
Telemetry [candidate] (14.993 ms) : 0, 14993
Flare Poller [baseline] (4.849 ms) : 0, 4849
Flare Poller [candidate] (4.887 ms) : 0, 4887
Loading
Startup time reports for petclinic
gantt
    title petclinic - global startup overhead: candidate=1.61.0-SNAPSHOT~3cedda9dc0, baseline=1.61.0-SNAPSHOT~fd65c0aa59

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.063 s) : 0, 1062725
Total [baseline] (11.137 s) : 0, 11137380
Agent [candidate] (1.058 s) : 0, 1057912
Total [candidate] (11.0 s) : 0, 11000033
section appsec
Agent [baseline] (1.243 s) : 0, 1243315
Total [baseline] (11.169 s) : 0, 11168870
Agent [candidate] (1.244 s) : 0, 1243915
Total [candidate] (11.111 s) : 0, 11111389
section iast
Agent [baseline] (1.229 s) : 0, 1228545
Total [baseline] (11.358 s) : 0, 11357529
Agent [candidate] (1.224 s) : 0, 1223693
Total [candidate] (11.326 s) : 0, 11326220
section profiling
Agent [baseline] (1.179 s) : 0, 1178514
Total [baseline] (10.962 s) : 0, 10961913
Agent [candidate] (1.181 s) : 0, 1180802
Total [candidate] (11.113 s) : 0, 11112522
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.063 s -
Agent appsec 1.243 s 180.59 ms (17.0%)
Agent iast 1.229 s 165.82 ms (15.6%)
Agent profiling 1.179 s 115.789 ms (10.9%)
Total tracing 11.137 s -
Total appsec 11.169 s 31.49 ms (0.3%)
Total iast 11.358 s 220.149 ms (2.0%)
Total profiling 10.962 s -175.467 ms (-1.6%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.058 s -
Agent appsec 1.244 s 186.003 ms (17.6%)
Agent iast 1.224 s 165.782 ms (15.7%)
Agent profiling 1.181 s 122.891 ms (11.6%)
Total tracing 11.0 s -
Total appsec 11.111 s 111.356 ms (1.0%)
Total iast 11.326 s 326.187 ms (3.0%)
Total profiling 11.113 s 112.489 ms (1.0%)
gantt
    title petclinic - break down per module: candidate=1.61.0-SNAPSHOT~3cedda9dc0, baseline=1.61.0-SNAPSHOT~fd65c0aa59

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.188 ms) : 0, 1188
crashtracking [candidate] (1.219 ms) : 0, 1219
BytebuddyAgent [baseline] (628.989 ms) : 0, 628989
BytebuddyAgent [candidate] (627.661 ms) : 0, 627661
AgentMeter [baseline] (29.583 ms) : 0, 29583
AgentMeter [candidate] (29.09 ms) : 0, 29090
GlobalTracer [baseline] (258.589 ms) : 0, 258589
GlobalTracer [candidate] (256.747 ms) : 0, 256747
AppSec [baseline] (31.82 ms) : 0, 31820
AppSec [candidate] (31.334 ms) : 0, 31334
Debugger [baseline] (59.883 ms) : 0, 59883
Debugger [candidate] (59.47 ms) : 0, 59470
Remote Config [baseline] (616.142 µs) : 0, 616
Remote Config [candidate] (621.171 µs) : 0, 621
Telemetry [baseline] (8.768 ms) : 0, 8768
Telemetry [candidate] (8.692 ms) : 0, 8692
Flare Poller [baseline] (7.308 ms) : 0, 7308
Flare Poller [candidate] (7.117 ms) : 0, 7117
section appsec
crashtracking [baseline] (1.205 ms) : 0, 1205
crashtracking [candidate] (1.186 ms) : 0, 1186
BytebuddyAgent [baseline] (656.683 ms) : 0, 656683
BytebuddyAgent [candidate] (656.33 ms) : 0, 656330
AgentMeter [baseline] (12.027 ms) : 0, 12027
AgentMeter [candidate] (12.054 ms) : 0, 12054
GlobalTracer [baseline] (257.691 ms) : 0, 257691
GlobalTracer [candidate] (258.008 ms) : 0, 258008
IAST [baseline] (23.969 ms) : 0, 23969
IAST [candidate] (23.999 ms) : 0, 23999
AppSec [baseline] (176.813 ms) : 0, 176813
AppSec [candidate] (177.393 ms) : 0, 177393
Debugger [baseline] (65.712 ms) : 0, 65712
Debugger [candidate] (65.724 ms) : 0, 65724
Remote Config [baseline] (563.938 µs) : 0, 564
Remote Config [candidate] (569.765 µs) : 0, 570
Telemetry [baseline] (8.932 ms) : 0, 8932
Telemetry [candidate] (8.927 ms) : 0, 8927
Flare Poller [baseline] (3.628 ms) : 0, 3628
Flare Poller [candidate] (3.62 ms) : 0, 3620
section iast
crashtracking [baseline] (1.202 ms) : 0, 1202
crashtracking [candidate] (1.19 ms) : 0, 1190
BytebuddyAgent [baseline] (797.897 ms) : 0, 797897
BytebuddyAgent [candidate] (793.385 ms) : 0, 793385
AgentMeter [baseline] (11.314 ms) : 0, 11314
AgentMeter [candidate] (11.268 ms) : 0, 11268
GlobalTracer [baseline] (247.073 ms) : 0, 247074
GlobalTracer [candidate] (246.706 ms) : 0, 246706
IAST [baseline] (25.086 ms) : 0, 25086
IAST [candidate] (25.085 ms) : 0, 25085
AppSec [baseline] (26.431 ms) : 0, 26431
AppSec [candidate] (26.405 ms) : 0, 26405
Debugger [baseline] (63.249 ms) : 0, 63249
Debugger [candidate] (64.903 ms) : 0, 64903
Remote Config [baseline] (521.056 µs) : 0, 521
Remote Config [candidate] (514.801 µs) : 0, 515
Telemetry [baseline] (14.863 ms) : 0, 14863
Telemetry [candidate] (13.8 ms) : 0, 13800
Flare Poller [baseline] (4.865 ms) : 0, 4865
Flare Poller [candidate] (4.564 ms) : 0, 4564
section profiling
crashtracking [baseline] (1.168 ms) : 0, 1168
crashtracking [candidate] (1.164 ms) : 0, 1164
BytebuddyAgent [baseline] (680.276 ms) : 0, 680276
BytebuddyAgent [candidate] (680.985 ms) : 0, 680985
AgentMeter [baseline] (8.635 ms) : 0, 8635
AgentMeter [candidate] (8.628 ms) : 0, 8628
GlobalTracer [baseline] (215.035 ms) : 0, 215035
GlobalTracer [candidate] (215.482 ms) : 0, 215482
AppSec [baseline] (31.82 ms) : 0, 31820
AppSec [candidate] (31.943 ms) : 0, 31943
Debugger [baseline] (62.814 ms) : 0, 62814
Debugger [candidate] (65.238 ms) : 0, 65238
Remote Config [baseline] (586.869 µs) : 0, 587
Remote Config [candidate] (588.955 µs) : 0, 589
Telemetry [baseline] (9.777 ms) : 0, 9777
Telemetry [candidate] (8.236 ms) : 0, 8236
Flare Poller [baseline] (4.235 ms) : 0, 4235
Flare Poller [candidate] (3.495 ms) : 0, 3495
ProfilingAgent [baseline] (93.557 ms) : 0, 93557
ProfilingAgent [candidate] (94.249 ms) : 0, 94249
Profiling [baseline] (94.128 ms) : 0, 94128
Profiling [candidate] (94.818 ms) : 0, 94818
Loading

Load

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master bbujon/ai-toolkit
git_commit_date 1773366819 1773389033
git_commit_sha fd65c0a 3cedda9
release_version 1.61.0-SNAPSHOT~fd65c0aa59 1.61.0-SNAPSHOT~3cedda9dc0
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1773391398 1773391398
ci_job_id 1503261684 1503261684
ci_pipeline_id 102316595 102316595
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-1-twr0o6mn 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-1-twr0o6mn 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 2 performance improvements and 2 performance regressions! Performance is the same for 16 metrics, 16 unstable metrics.

scenario Δ mean agg_http_req_duration_p50 Δ mean agg_http_req_duration_p95 Δ mean throughput candidate mean agg_http_req_duration_p50 candidate mean agg_http_req_duration_p95 candidate mean throughput baseline mean agg_http_req_duration_p50 baseline mean agg_http_req_duration_p95 baseline mean throughput
scenario:load:insecure-bank:iast:high_load better
[-245.931µs; -128.690µs] or [-9.424%; -4.931%]
same
[-427.316µs; +23.293µs] or [-5.719%; +0.312%]
unstable
[-91.069op/s; +220.444op/s] or [-6.574%; +15.913%]
2.422ms 7.270ms 1450.031op/s 2.610ms 7.472ms 1385.344op/s
scenario:load:petclinic:profiling:high_load worse
[+0.792ms; +1.914ms] or [+4.324%; +10.447%]
worse
[+0.990ms; +2.457ms] or [+3.335%; +8.276%]
unstable
[-39.620op/s; +8.308op/s] or [-15.820%; +3.317%]
19.675ms 31.413ms 234.781op/s 18.322ms 29.689ms 250.438op/s
scenario:load:petclinic:no_agent:high_load better
[-1.957ms; -0.401ms] or [-10.370%; -2.125%]
unstable
[-3.423ms; -0.146ms] or [-10.914%; -0.467%]
unstable
[-11.184op/s; +41.684op/s] or [-4.622%; +17.227%]
17.695ms 29.576ms 257.219op/s 18.874ms 31.360ms 241.969op/s
Request duration reports for petclinic
gantt
    title petclinic - request duration [CI 0.99] : candidate=1.61.0-SNAPSHOT~3cedda9dc0, baseline=1.61.0-SNAPSHOT~fd65c0aa59
    dateFormat X
    axisFormat %s
section baseline
no_agent (19.291 ms) : 19094, 19488
.   : milestone, 19291,
appsec (19.493 ms) : 19289, 19698
.   : milestone, 19493,
code_origins (17.702 ms) : 17528, 17877
.   : milestone, 17702,
iast (17.85 ms) : 17670, 18030
.   : milestone, 17850,
profiling (18.639 ms) : 18451, 18827
.   : milestone, 18639,
tracing (17.787 ms) : 17610, 17964
.   : milestone, 17787,
section candidate
no_agent (18.143 ms) : 17957, 18329
.   : milestone, 18143,
appsec (19.635 ms) : 19433, 19838
.   : milestone, 19635,
code_origins (17.797 ms) : 17618, 17976
.   : milestone, 17797,
iast (18.261 ms) : 18080, 18443
.   : milestone, 18261,
profiling (19.885 ms) : 19682, 20088
.   : milestone, 19885,
tracing (17.802 ms) : 17623, 17982
.   : milestone, 17802,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 19.291 ms [19.094 ms, 19.488 ms] -
appsec 19.493 ms [19.289 ms, 19.698 ms] 202.755 µs (1.1%)
code_origins 17.702 ms [17.528 ms, 17.877 ms] -1.589 ms (-8.2%)
iast 17.85 ms [17.67 ms, 18.03 ms] -1.441 ms (-7.5%)
profiling 18.639 ms [18.451 ms, 18.827 ms] -651.856 µs (-3.4%)
tracing 17.787 ms [17.61 ms, 17.964 ms] -1.504 ms (-7.8%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 18.143 ms [17.957 ms, 18.329 ms] -
appsec 19.635 ms [19.433 ms, 19.838 ms] 1.493 ms (8.2%)
code_origins 17.797 ms [17.618 ms, 17.976 ms] -346.063 µs (-1.9%)
iast 18.261 ms [18.08 ms, 18.443 ms] 118.615 µs (0.7%)
profiling 19.885 ms [19.682 ms, 20.088 ms] 1.743 ms (9.6%)
tracing 17.802 ms [17.623 ms, 17.982 ms] -340.51 µs (-1.9%)
Request duration reports for insecure-bank
gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.61.0-SNAPSHOT~3cedda9dc0, baseline=1.61.0-SNAPSHOT~fd65c0aa59
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.194 ms) : 1183, 1206
.   : milestone, 1194,
iast (3.305 ms) : 3259, 3351
.   : milestone, 3305,
iast_FULL (5.7 ms) : 5644, 5757
.   : milestone, 5700,
iast_GLOBAL (3.388 ms) : 3347, 3430
.   : milestone, 3388,
profiling (2.115 ms) : 2095, 2136
.   : milestone, 2115,
tracing (1.758 ms) : 1743, 1773
.   : milestone, 1758,
section candidate
no_agent (1.169 ms) : 1158, 1180
.   : milestone, 1169,
iast (3.155 ms) : 3111, 3199
.   : milestone, 3155,
iast_FULL (5.794 ms) : 5737, 5852
.   : milestone, 5794,
iast_GLOBAL (3.544 ms) : 3483, 3605
.   : milestone, 3544,
profiling (2.176 ms) : 2156, 2196
.   : milestone, 2176,
tracing (1.77 ms) : 1756, 1784
.   : milestone, 1770,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.194 ms [1.183 ms, 1.206 ms] -
iast 3.305 ms [3.259 ms, 3.351 ms] 2.111 ms (176.8%)
iast_FULL 5.7 ms [5.644 ms, 5.757 ms] 4.506 ms (377.3%)
iast_GLOBAL 3.388 ms [3.347 ms, 3.43 ms] 2.194 ms (183.7%)
profiling 2.115 ms [2.095 ms, 2.136 ms] 921.004 µs (77.1%)
tracing 1.758 ms [1.743 ms, 1.773 ms] 563.315 µs (47.2%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.169 ms [1.158 ms, 1.18 ms] -
iast 3.155 ms [3.111 ms, 3.199 ms] 1.986 ms (169.8%)
iast_FULL 5.794 ms [5.737 ms, 5.852 ms] 4.625 ms (395.5%)
iast_GLOBAL 3.544 ms [3.483 ms, 3.605 ms] 2.375 ms (203.1%)
profiling 2.176 ms [2.156 ms, 2.196 ms] 1.007 ms (86.1%)
tracing 1.77 ms [1.756 ms, 1.784 ms] 600.506 µs (51.4%)

Dacapo

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master bbujon/ai-toolkit
git_commit_date 1773366819 1773389033
git_commit_sha fd65c0a 3cedda9
release_version 1.61.0-SNAPSHOT~fd65c0aa59 1.61.0-SNAPSHOT~3cedda9dc0
See matching parameters
Baseline Candidate
application biojava biojava
ci_job_date 1773391076 1773391076
ci_job_id 1503261687 1503261687
ci_pipeline_id 102316595 102316595
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-0-b32iywje 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-0-b32iywje 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics.

Execution time for tomcat
gantt
    title tomcat - execution time [CI 0.99] : candidate=1.61.0-SNAPSHOT~3cedda9dc0, baseline=1.61.0-SNAPSHOT~fd65c0aa59
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.48 ms) : 1468, 1492
.   : milestone, 1480,
appsec (3.747 ms) : 3531, 3964
.   : milestone, 3747,
iast (2.261 ms) : 2192, 2331
.   : milestone, 2261,
iast_GLOBAL (2.308 ms) : 2238, 2378
.   : milestone, 2308,
profiling (2.108 ms) : 2051, 2164
.   : milestone, 2108,
tracing (2.059 ms) : 2006, 2113
.   : milestone, 2059,
section candidate
no_agent (1.473 ms) : 1462, 1485
.   : milestone, 1473,
appsec (3.823 ms) : 3600, 4045
.   : milestone, 3823,
iast (2.264 ms) : 2195, 2334
.   : milestone, 2264,
iast_GLOBAL (2.296 ms) : 2226, 2365
.   : milestone, 2296,
profiling (2.094 ms) : 2038, 2150
.   : milestone, 2094,
tracing (2.058 ms) : 2005, 2111
.   : milestone, 2058,
Loading
  • baseline results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 1.48 ms [1.468 ms, 1.492 ms] -
appsec 3.747 ms [3.531 ms, 3.964 ms] 2.267 ms (153.2%)
iast 2.261 ms [2.192 ms, 2.331 ms] 781.341 µs (52.8%)
iast_GLOBAL 2.308 ms [2.238 ms, 2.378 ms] 827.78 µs (55.9%)
profiling 2.108 ms [2.051 ms, 2.164 ms] 627.549 µs (42.4%)
tracing 2.059 ms [2.006 ms, 2.113 ms] 579.384 µs (39.1%)
  • candidate results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 1.473 ms [1.462 ms, 1.485 ms] -
appsec 3.823 ms [3.6 ms, 4.045 ms] 2.349 ms (159.5%)
iast 2.264 ms [2.195 ms, 2.334 ms] 791.251 µs (53.7%)
iast_GLOBAL 2.296 ms [2.226 ms, 2.365 ms] 822.557 µs (55.8%)
profiling 2.094 ms [2.038 ms, 2.15 ms] 620.603 µs (42.1%)
tracing 2.058 ms [2.005 ms, 2.111 ms] 584.925 µs (39.7%)
Execution time for biojava
gantt
    title biojava - execution time [CI 0.99] : candidate=1.61.0-SNAPSHOT~3cedda9dc0, baseline=1.61.0-SNAPSHOT~fd65c0aa59
    dateFormat X
    axisFormat %s
section baseline
no_agent (15.521 s) : 15521000, 15521000
.   : milestone, 15521000,
appsec (14.776 s) : 14776000, 14776000
.   : milestone, 14776000,
iast (18.254 s) : 18254000, 18254000
.   : milestone, 18254000,
iast_GLOBAL (17.876 s) : 17876000, 17876000
.   : milestone, 17876000,
profiling (15.008 s) : 15008000, 15008000
.   : milestone, 15008000,
tracing (14.917 s) : 14917000, 14917000
.   : milestone, 14917000,
section candidate
no_agent (15.462 s) : 15462000, 15462000
.   : milestone, 15462000,
appsec (15.09 s) : 15090000, 15090000
.   : milestone, 15090000,
iast (18.213 s) : 18213000, 18213000
.   : milestone, 18213000,
iast_GLOBAL (17.154 s) : 17154000, 17154000
.   : milestone, 17154000,
profiling (15.234 s) : 15234000, 15234000
.   : milestone, 15234000,
tracing (14.926 s) : 14926000, 14926000
.   : milestone, 14926000,
Loading
  • baseline results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 15.521 s [15.521 s, 15.521 s] -
appsec 14.776 s [14.776 s, 14.776 s] -745.0 ms (-4.8%)
iast 18.254 s [18.254 s, 18.254 s] 2.733 s (17.6%)
iast_GLOBAL 17.876 s [17.876 s, 17.876 s] 2.355 s (15.2%)
profiling 15.008 s [15.008 s, 15.008 s] -513.0 ms (-3.3%)
tracing 14.917 s [14.917 s, 14.917 s] -604.0 ms (-3.9%)
  • candidate results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 15.462 s [15.462 s, 15.462 s] -
appsec 15.09 s [15.09 s, 15.09 s] -372.0 ms (-2.4%)
iast 18.213 s [18.213 s, 18.213 s] 2.751 s (17.8%)
iast_GLOBAL 17.154 s [17.154 s, 17.154 s] 1.692 s (10.9%)
profiling 15.234 s [15.234 s, 15.234 s] -228.0 ms (-1.5%)
tracing 14.926 s [14.926 s, 14.926 s] -536.0 ms (-3.5%)

Comment on lines +37 to +44
## Step 2 – Clarify the task

If the user has not already provided all of the following, ask before proceeding:

- **Framework name** and **minimum supported version** (e.g. `okhttp-3.0`)
- **Target class(es) and method(s)** to instrument (fully qualified class names preferred)
- **Target system**: one of `Tracing`, `Profiling`, `AppSec`, `Iast`, `CiVisibility`, `Usm`, `ContextTracking`
- **Whether this is a bootstrap instrumentation** (affects allowed imports)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im curious, genuine question, do you know if the ask a user a question works in the current state of the skill, given AskUserQuestion is not in allowed-tools?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if it is not in the allowed tools, it will come down to the security rules, the user allowed tools, and ask to use it otherwise. It’s not "allowed by default" but might be useful to add it nonetheless 🤔 Similarly, it will need web search but I don’t want to enabled it by default for security reasons.

Comment on lines +20 to +22
1. `docs/how_instrumentations_work.md` — full reference (types, methods, advice, helpers, context stores, decorators)
2. `docs/add_new_instrumentation.md` — step-by-step walkthrough
3. `docs/how_to_test.md` — test types and how to run them
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally for reference files, I advise to use proper markdown linking, it does't help the LLM, but it does help engineers to quickly navigate to the files. Just a suggestion 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I advise to use proper markdown linking

So you would use something like that?

1. [docs/how_instrumentations_work.md](full reference (types, methods, advice, helpers, context stores, decorators))
2. [docs/add_new_instrumentation.md](step-by-step walkthrough)
3. [docs/how_to_test.md](test types and how to run them)


Before writing any code, read all three files in full:

1. `docs/how_instrumentations_work.md` — full reference (types, methods, advice, helpers, context stores, decorators)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how it will perform given this reference is almost 1k lines, I'm not sure tbh.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question... Not sure either but it is ingesting many documentation files and instrumentations before starting implementation, but it looks like doing it using subagent. So we might be in the clear about context management.

Here is a report about creating (again) the Feign instrumentation:

Direct reads (Read tool)

  ┌──────────────────────────────────────────────────────────────┬───────────────────────────┐
  │                             File                             │           Lines           │
  ├──────────────────────────────────────────────────────────────┼───────────────────────────┤
  │ google-http-client-1.19/GoogleHttpClientInstrumentation.java │ 121                       │
  ├──────────────────────────────────────────────────────────────┼───────────────────────────┤
  │ google-http-client-1.19/GoogleHttpClientDecorator.java       │ 68                        │
  ├──────────────────────────────────────────────────────────────┼───────────────────────────┤
  │ google-http-client-1.19/HeadersInjectAdapter.java            │ 16                        │
  ├──────────────────────────────────────────────────────────────┼───────────────────────────┤
  │ google-http-client-1.19/build.gradle                         │ 21                        │
  ├──────────────────────────────────────────────────────────────┼───────────────────────────┤
  │ google-http-client-1.19/AbstractGoogleHttpClientTest.groovy  │ 53                        │
  ├──────────────────────────────────────────────────────────────┼───────────────────────────┤
  │ google-http-client-1.19/GoogleHttpClientTest.groovy          │ 21                        │
  ├──────────────────────────────────────────────────────────────┼───────────────────────────┤
  │ pekko-http-1.0/HttpHeaderSubclassesInstrumentation.java      │ 60 (partial)              │
  ├──────────────────────────────────────────────────────────────┼───────────────────────────┤
  │ javax-websocket-1.0/SessionInstrumentation.java              │ 60 (partial)              │
  ├──────────────────────────────────────────────────────────────┼───────────────────────────┤
  │ apache-httpclient-4.0/HelperMethods.java                     │ 76                        │
  ├──────────────────────────────────────────────────────────────┼───────────────────────────┤
  │ apache-httpclient-4.0/ApacheHttpClientInstrumentation.java   │ 277                       │
  ├──────────────────────────────────────────────────────────────┼───────────────────────────┤
  │ settings.gradle.kts                                          │ 10 (partial)              │
  ├──────────────────────────────────────────────────────────────┼───────────────────────────┤
  │ Total                                                        │ 783 lines across 11 files │
  └──────────────────────────────────────────────────────────────┴───────────────────────────┘

  Via grep/bash (content snippets)

  - HttpClientDecorator.java — abstract method signatures (~15 lines)
  - HttpClientTest.groovy — abstract method signatures (~10 lines)
  - Various directory listings and module path lookups

  Via subagents (delegated research)

  - Feign API research agent — 13 tool calls, web searches on Feign's API, class hierarchy, Maven coordinates, and version history
  - HTTP client patterns agent — 34 tool calls, read OkHttp, Apache HttpClient, and Google HTTP Client instrumentation files in full (~1,500 estimated lines across ~12 files)

  Summary

  ┌───────────────────┬───────────┬──────────────┐
  │      Source       │   Files   │    ~Lines    │
  ├───────────────────┼───────────┼──────────────┤
  │ Direct reads      │ 11        │ 783          │
  ├───────────────────┼───────────┼──────────────┤
  │ Subagent reads    │ ~12       │ ~1,500       │
  ├───────────────────┼───────────┼──────────────┤
  │ Web/docs research │ —         │ —            │
  ├───────────────────┼───────────┼──────────────┤
  │ Total             │ ~23 files │ ~2,300 lines │
  └───────────────────┴───────────┴──────────────┘

  The subagents did the bulk of the pattern research, freeing the main context for writing the actual implementation.

- `@Advice.Return` — the return value (exit only)
- `@Advice.Thrown` — the thrown exception (exit only)
- `@Advice.Enter` — the return value of the enter method (exit only)
- Use `CallDepthThreadLocalMap` to guard against recursive instrumentation of the same method
Copy link
Contributor

@mcculls mcculls Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add: "- Do not use lambdas in advice methods"

EDIT: this should go in the "Must NOT do" section below...

Enter method:
1. `AgentSpan span = startSpan(DECORATE.operationName(), ...)`
2. `DECORATE.afterStart(span)` + set domain-specific tags
3. `AgentScope scope = activateSpan(span)` — return or store via `@Advice.Local`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we push it towards the Context API as that will be preferred going forwards?

ContextScope scope = span.attach()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should revisit our docs (/docs) first, and then reflect the upgrade to the skill. WDYT?
Upgrading the code base would also help as it is heavily reading at the other instrumentations as example as it does not have reference document / codebase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the files it reads to get knowledge to build (again) the Feign instrumentation here: #10774 (comment)
You can see he’s relying on some other instrumentations to know how to proceed. So cleaning up our codebase or providing references to the skills would help better I guess.


## Step 12 – Retrospective: update this skill with what was learned

After the instrumentation is complete (or abandoned), review the full session and improve this skill for future use.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen this type of instruction before and I'm curious how it'll perform.

My one concern with this is that we are instructing it to update the instrumentation with lessons learned before any human review is in the loop, could be too early?

I like the idea though and would like to see it in action, especially as we are in prototyping stages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My one concern with this is that we are instructing it to update the instrumentation with lessons learned before any human review is in the loop, could be too early?

It's interesting to see the changes it makes according to the instrumentation challenges it faces.
I did not include its discovery and changes so far because it feels too early. Especially without way golden instrumentations and easy way to compare to output.

Comment on lines +196 to +206
- [ ] `settings.gradle.kts` entry added in alphabetical order
- [ ] `build.gradle` has `compileOnly` deps and `muzzle` directives with `assertInverse = true`
- [ ] `@AutoService(InstrumenterModule.class)` annotation present on the module class
- [ ] `helperClassNames()` lists ALL referenced helpers (including inner, anonymous, and enum synthetic classes)
- [ ] Advice methods are `static` with `@Advice.OnMethodEnter` / `@Advice.OnMethodExit` annotations
- [ ] `suppress = Throwable.class` on enter/exit (unless the hooked method is a constructor)
- [ ] No logger field in the Advice class or InstrumenterModule class
- [ ] No `inline=false` left in production code
- [ ] No `java.util.logging.*` / `java.nio.*` / `javax.management.*` in bootstrap path
- [ ] Span lifecycle order is correct: startSpan → afterStart → activateSpan (enter); onError → beforeFinish → finish → close (exit)
- [ ] Muzzle passes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we mention the new context API and reference, with notes that the context api must be used and there may be limited examples, and new integrations can be based off of reference integrations, but still should use the new context api.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but still should use the new context api.

For clarification, using the new Context API where an instrumentation is dependent of some other instrumentations using the legacy way may make the generated instrumentation fails. It’s not like always apply it to make it work, it is contextual about how instrumentations interact with each others. And in this case, it feels like the LLM is doing a good job at finding the most relevant / working API to use on average.

Comment on lines +207 to +209
- [ ] Instrumentation tests pass
- [ ] `latestDepTest` passes
- [ ] `spotlessCheck` passes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we mention the new context API and reference, with notes that the context api must be used and there may be limited examples, and new integrations can be based off of reference integrations, but still should use the new context api.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tag: ai generated Largely based on code generated by an AI or LLM tag: experimental Experimental changes tag: no release notes Changes to exclude from release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants